Minimum processor instruction for implementing weighted fair queuing and other priority queuing

ABSTRACT

The present invention provides techniques for efficiently determining a minimum or maximum of a plurality of values and the index of the minimum using registers of a processor. The present invention also provides for various processor instructions for determining the minimum/maximum and index of two or more values. The present invention finds particular benefit in implementing heaps and in systems utilizing Weighted Fair Queuing (WFQ).

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a divisional application of U.S. patentapplication Ser. No. 10/757,587, filed Jan. 15, 2004, and entitled“Minimum processor instruction for implementing weighted fair queuingand other priority queuing”, which claimed priority to U.S. ProvisionalNo. 60/440,026, filed Jan. 15, 2003, and entitled “Minimum ProcessorInstruction for Implementing Weighted Fair Queuing and Other PriorityQueuing,” the entirety of which is incorporated by reference herein.

FIELD OF THE INVENTION

The present invention relates generally to determining a minimum and/ormaximum of a multiplicity of values and more particularly to usingefficient processor instructions to determine a minimum and/or maximumof the values stored in a plurality of registers of a processor.

BACKGROUND OF THE INVENTION

Various processes have been developed to determine the extreme value(s)(i.e., a minimum or maximum) of a plurality of values. One such methodincludes a sequential comparison, whereby the values are arranged in alist and each value in the list is compared to the minimum of theprevious values of the list. The number of steps needed to perform asequential comparison of n values is on the order of O(n) steps. Anotherknown sorting process includes maintaining a sorted queue, whereby eachof the plurality of values is less than or equal to one of itsneighboring values and greater than or equal to its other neighboringvalue. Accordingly, the extraction of either the minimum or maximumvalue of the sorted queue is on the order of O(1) steps, while insertinga value into the sorted queue is on the order of O(n) steps.

Another sorting method combining the benefits of both the unsorted andsorted queues includes generating and maintaining a heap data structure.In a minimum heap data structure, each parent node of the heap is eitherless than or equal to its child nodes. This results in the minimum valueof the heap being at the root node of the heap. Therefore, theidentification of the minimum value in a proper heap is on the order ofO(1). However, the extraction or modification of the minimum value froma heap of n values is on the order of O(log n) steps, as is theinsertion of a new value into the heap data structure.

Heap data structures and/or other sorting processes often are used innetwork processors to implement various scheduling processes, such asWeighted Fair Queuing (WFQ) or Generalized Processor Scheduling (GPS),to ensure that each incoming data stream receives a certain portion ofthe outgoing data stream bandwidth and/or to provide a latency bound foreach incoming data stream. For example, a common method to implement WFQscheduling is to associate a “due timestamp” value with each of theinput queues used to buffer the incoming data streams. For each outputcycle, a WFQ scheduler selects the input queue with the minimum “duetimestamp” to provide a protocol data unit (PDU) for output. During orafter the transmission of the PDU (e.g., a data packet) from theselected queue, the scheduler updates the “due timestamp” of theselected queue based at least in part on the assigned bandwidth and thesize of the current PDU in the selected queue. Accordingly, a minimumheap data structure or a sorted queue often is used to determine theminimum “due timestamp” of the plurality of input queues as well as theindex associated with the queue.

While priority queues, such as sorted queues or heap data structures,often are useful in scheduling processes, the insertion, removal, and/ormaintenance of such priority queues often consumes a considerableportion of the processing cycles of a processor implementing thescheduling process. For example, to determine the minimum value in anunsorted queue, at least one compare instruction and one jumpinstruction typically are performed for each comparison of a value inthe queue to the minimum of the previous values. As such, at least 2ninstructions are performed to identify the minimum value and/or itsindex in the queue. Similarly, for heap data structures, the insertionof a new value (such as when the “due timestamp” for an input queue ismodified) into the heap data structure or a removal of a value oftennecessitates a branch instruction and a jump instruction for eachcomparison of a parent node to a child node. Since O(log n) comparisonstypically are performed when inserting/removing a value into/from a heapdata structure of n values, the typical insertion/removal operationtakes at least 2 log n cycles to perform.

Accordingly, an improved processor instruction, method, and/or systemfor determining an extreme value of a plurality of values in a heap orother priority queue would be advantageous.

SUMMARY OF THE INVENTION

The present invention mitigates or solves the above-identifiedlimitations in known solutions, as well as other unspecifieddeficiencies in known solutions. A number of advantages associated withthe present invention are readily evident to those skilled in the art,including economy of design and resources, transparent operation, costsavings, etc.

A WFQ protocol data unit (PDU) system typically has several inputqueues. At each queuing cycle, the scheduling system selects one of thequeues for transmission of a packet in the selected queue by providingthe packet to an outgoing port or by placing the packet in an outputqueue. When the queues have bandwidth shares that sum up to one, a WFQscheduler typically can guarantee that each of the queues receives atleast its share of the bandwidth on the outgoing port, where each queuemay have a set or dynamic portion of the bandwidth. It also can providea latency bound for each queue.

A common method for implementing a WFQ PDU scheduling system in acommunications processor is by having a “due” timestamp value associatedwith each of the queues. The scheduler, for each PDU that istransmitted, selects the queue with the minimum “due” timestamp. Whiletransmitting a PDU from the selected queue, the scheduler programupdates the “due” timestamp of the selected queue using the queue'sshare of the bandwidth and the current PDU size.

In order to determine the minimum “due” timestamp and the index of thecorresponding queue, the present invention provides for a processorinstruction for determining the minimum of at least two values and theindex of the minimum. The processor instruction has as inputs adestination register rD, two source registers rA and rB, and animmediate field (“index”). The destination register and a sourceregister may comprise a common or same register.

In one embodiment, the processor instruction is adapted to manipulatethe processor to compare (unsigned) the N low-order bits of rA and rB.If the N low-order bits of rA are less than or equal to the N low-orderbits of rB, the processor instruction manipulates the processor to copyat least a portion of the contents of register rA to the destinationregister rD. Otherwise, the immediate field is concatenated with the Nlow-order bits of rB and the concatenated value is copied to thedestination register rD.

In another embodiment, the value stored in rA is compared to the valuestored in rB. If the value in rA is less than or equal to the value inrB, the value in rA is copied to the destination register rD and thevalue of a MINDEX register remains unchanged. Otherwise, the value of rBis copied to rD and the immediate field is copied to the MINDEXregister.

In yet another embodiment, the processor instruction is adapted tomanipulate a processor to compare, for each of the two source registers,the value of a first portion of the register with the value of a secondportion of the register. The minimum value of the first portion and thesecond portion of each register is copied to the corresponding portionof the destination register. Likewise, the corresponding portion of theMINDEX register is updated to reflect the index of the minimum values ofthe source registers.

Heaps often are used to implement traffic shapers in networkingequipment. The processor instructions described above can be utilized insuch an operation to efficiently determine the minimum and its index ofthe children of a parent node in fewer processor cycles as compared totypical sorting processes using a combination of branch and jumpprocessor instructions. This is especially applicable in whenimplementing variations of heaps having up to N children (N=8, forexample) as opposed to two children as in most standard heaps.

In accordance with one embodiment of the present invention, a processorfor determining a minimum value of a plurality of values stored insource registers and determining an index value of a source registerhaving the minimum value is provided. The processor comprises adestination register, a first source register storing a first value, asecond source register storing a second value, means for comparing thefirst value stored in the first source register with the second valuestored in the second source register, means for storing the first valuein the destination register when the first value is less than or equalto the second value and means for concatenating the index value with thesecond value into a concatenated value and storing the concatenatedvalue in the destination register when the second value is less than thefirst value.

In accordance with another embodiment of the present invention, aprocessor for determining a minimum value of a plurality of valuesstored in source registers and determining an index value of sourceregister having the minimum value is provided. The processor comprisesmeans for determining a first minimum value of a first value and asecond value, means for determining a second minimum value of a thirdvalue and a fourth value, means for storing the first minimum value in afirst portion of a first destination register and the second minimumvalue in a second portion of the first destination register, and meansfor storing a first index value associated with the first minimum valuein a first portion of a second destination register and a second indexvalue associated with the second minimum value in a second portion ofthe second destination register, wherein the means for determining thefirst minimum value and the means for determining the second minimumvalue are adapted to execute in parallel.

In accordance with yet another embodiment of the present invention, amethod for determining a minimum value and a corresponding index valueof a plurality of source registers of a processor is provided. Themethod comprises the steps of, for each of the plurality of sourceregisters, comparing a value stored in the source register with a valuestored in a destination register, concatenating the value stored in thesource register with an index value associated with the source registerand storing the concatenated value in the destination register when thevalue stored in the source register is less than the value stored in thedestination register, and wherein the destination register initiallyincludes an index value and a value of a first source register of theplurality of source registers.

In accordance with an additional embodiment of the present invention, acustomer premise equipment (CPE) is provided. The CPE comprises anetwork interface operably connected to a first network segment, anetwork interface operably connected to a second network segment and aprocessor operably connected to the network interfaces and being adaptedto compare a first value stored in a first source register of theprocessor with a second value stored in a second source register of theprocessor, store the first value in a first destination register of theprocessor when the first value is less than or equal to the second valueand store the second value in the first destination register of theprocessor and an index value in a second destination register of theprocessor when the second value is less than the first value, the indexvalue representing the second source register.

In accordance with one embodiment of the present invention, a processorfor determining a maximum value of a plurality of values stored insource registers and determining an index value of a source registerhaving the maximum value is provided. The processor comprises adestination register, a first source register storing a first value, asecond source register storing a second value, means for comparing thefirst value stored in the first source register with the second valuestored in the second source register, means for storing the first valuein the destination register when the first value is greater than orequal to the second value and means for concatenating the index valuewith the second value into a concatenated value and storing theconcatenated value in the destination register when the second value isgreater than the first value.

In accordance with another embodiment of the present invention, aprocessor for determining a maximum value of a plurality of valuesstored in source registers and determining an index value of sourceregister having the maximum value is provided. The processor comprisesmeans for determining a first maximum value of a first value and asecond value, means for determining a second maximum value of a thirdvalue and a fourth value, means for storing the first maximum value in afirst portion of a first destination register and the second maximumvalue in a second portion of the first destination register, and meansfor storing a first index value associated with the first maximum valuein a first portion of a second destination register and a second indexvalue associated with the second maximum value in a second portion ofthe second destination register, wherein the means for determining thefirst maximum value and the means for determining the second maximumvalue are adapted to execute in parallel.

In accordance with yet another embodiment of the present invention, amethod for determining a maximum value and a corresponding index valueof a plurality of source registers of a processor is provided. Themethod comprises the steps of, for each of the plurality of sourceregisters, comparing a value stored in the source register with a valuestored in a destination register, concatenating the value stored in thesource register with an index value associated with the source registerand storing the concatenated value in the destination register when thevalue stored in the source register is greater than the value stored inthe destination register, and wherein the destination register initiallyincludes an index value and a value of a first source register of theplurality of source registers.

In accordance with an additional embodiment of the present invention, acustomer premise equipment (CPE) is provided. The CPE comprises anetwork interface operably connected to a first network segment, anetwork interface operably connected to a second network segment and aprocessor operably connected to the network interfaces and being adaptedto compare a first value stored in a first source register of theprocessor with a second value stored in a second source register of theprocessor, store the first value in a first destination register of theprocessor when the first value is greater than or equal to the secondvalue and store the second value in the first destination register ofthe processor and an index value in a second destination register of theprocessor when the second value is greater than the first value, theindex value representing the second source register.

One advantage of the present invention is a reduced processing effortfor determining a minimum of a plurality of values. Another advantage ofthe present invention is the simultaneous determination of both aminimum value and the index of the minimum. Still further features andadvantages of the present invention are identified in the ensuingdescription, with reference to the drawings identified below.

Although described herein with respect to determining minimum values, atleast one embodiment of the present invention may be implemented todetermine any extreme value or other distinguishable characteristic.

BRIEF DESCRIPTION OF THE DRAWINGS

The purpose and advantages of the present invention will be apparent tothose of ordinary skill in the art from the following detaileddescription in conjunction with the appended drawings in which likereference characters are used to indicate like elements, and in which:

FIG. 1 is a flow diagram illustrating an exemplary “min” processorinstruction for determining a minimum of two values and an indexassociated with the minimum in accordance with at least one embodimentof the present invention.

FIG. 2 is a flow diagram illustrating an exemplary sequence of processorinstructions of FIG. 1 for determining a minimum of eight values and anindex associated with the minimum in accordance with at least oneembodiment of the present invention.

FIG. 3 is a flow diagram illustrating an exemplary “Min” processorinstruction for determining a minimum of two values and an indexassociated with the minimum in accordance with at least one embodimentof the present invention.

FIG. 4 is a flow diagram illustrating an exemplary sequence of processorinstructions of FIG. 3 for determining a minimum of a plurality ofvalues and the index of the minimum in accordance with at least oneembodiment of the present invention.

FIG. 5 is a flow diagram illustrating an exemplary “MIN” processorinstruction for determining a minimum for each of two sets of values andthe indexes of the minimums in accordance with at least one embodimentof the present invention.

FIG. 6 is a flow diagram illustrating an exemplary sequence of processorinstructions of FIG. 5 for determining a minimum of a plurality ofvalues and the index of the minimum in accordance with at least oneembodiment of the present invention.

FIG. 7 is a schematic diagram illustrating an exemplary customer premiseequipment (CPE) using Weighted Fair Queuing (WFQ) in accordance with atleast one embodiment of the present invention.

FIGS. 8 and 9 are schematic diagrams illustrating an arithmetic logicunit of the CPE of FIG. 7 for determining a minimum for each of two setsof values and the indexes associated with the minimums in accordancewith at least one embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The following description is intended to convey a thorough understandingof the present invention by providing a number of specific embodimentsand details involving processor instructions for determining minimumvalues, maximum values and/or other distinguishable characteristics of aplurality of values as well as the index associated with theminimum/maximum. It is understood, however, that the present inventionis not limited to these specific embodiments and details, which areexemplary only. It is further understood that one possessing ordinaryskill in the art, in light of known systems and methods, wouldappreciate the use of the invention for its intended purposes andbenefits in any number of alternative embodiments, depending uponspecific design and other needs.

FIGS. 1-8 illustrate exemplary methods for determining the minimum of aplurality of values stored in registers and/or memory and an indexassociated with the minimum, as well as exemplary processor instructionsfor manipulating a processor to determining the minimum and index.However, those skilled in the art can implement the present invention todetermine the maximum of the plurality of values and the index of themaximum using the guidelines provided herein. Those skilled in the artalso may implement the present invention to determine otherdistinguishable characteristics of one or more data values, using theguidelines provided herein.

Referring now to FIG. 1, an exemplary implementation of a processorinstruction for determining a minimum of two values and the index of theminimum is illustrated in accordance with at least one embodiment of thepresent invention. In one embodiment, the processor instruction, hereinreferred to as the “min” instruction, includes four input fields: afield referencing a destination register of a processor; a fieldreferencing a first source register; a field referencing a second sourceregister; and an immediate field for providing the constant value“index”. To illustrate, the Assembly language form of the “min”instruction could be represented as “min rD, rA, rB, index”, where rD isthe destination register, rA is the first source register, rB is thesecond source register, and index is the supplied index value. As usedherein, a “source” register includes a register having one or morevalues to be compared, whereas a “destination” register includes aregister for storing values, such as the minimum value of two valuesresulting from a comparison. A register can be both a source registerand a destination register for the same “min” processor instruction.

The flow diagram 100 illustrates an exemplary process performed by aprocessor executing the “min” instruction. In at least one embodiment,the “min” processor instruction illustrated in FIG. 1 is implementedsuch that it is executed by a processor in one processor cycle.

In at least one embodiment, the N low-order bits of registers rA and rB(each having a total of S bits) represent values to be compared whereasthe remaining high-order bits (i.e., S-1 to N) are irrelevant forcomparison purposes. Accordingly, at step 102 the processor performs anunsigned comparison using, for example, a comparator on the N low-orderbits of rA and rB (designated as rA[N-1:0] and rB[N-1:0], respectively).If the unsigned value stored in rA[N-1:0] is determined to be less thanor equal to the unsigned value stored in rB[N-1:0] (step 104), then theprocessor is manipulated to either store the entire value (i.e.,rA[S-1:0]) to the destination register rD at step 106, thereby settingthe value stored in rD equal to the value stored in rA.

If rB is determined to be less than rA at step 104, then the suppliedindex value is concatenated with the N lower order bits of rB at step108. To illustrate, if the “min” instruction processed is “min rD, rA,rB, 3” and rB had a value “0000 1010” (N=6), the index value of 3(“0011” in binary) would be concatenated with the four low-order bits ofrB (“1010” in binary) resulting in a concatenated binary value of “00111010”. The concatenated value is then stored in the destination registerrD at step 110.

Referring now to FIG. 2, an exemplary implementation of a sequence of“min” instructions to determine a minimum value of a plurality ofregisters is illustrated. A sequence of “min” instructions as describedin FIG. 1 can be implemented to determine both the minimum value of aplurality of registers and the index of the register having the minimumvalue. To illustrate, the following sequence of “min” instructions canbe executed to find the minimum value of eight 8-bit source registers(r1-r8) indexed 0-7 respectively. Each source register contains a valuerepresented by the four low-order bits (i.e., rX[3:0]) The resultingminimum and index are stored in an 8-bit destination register r10.

min r10, r1, r2, 1

min r10, r10, r3, 2

min r10, r10, r4, 3

min r10, r10, r5, 4

min r10, r10, r6, 5

min r10, r10, r7, 6

min r10, r10, r8, 7

sri r1 1, r10, 4

As illustrated in FIG. 2, during the execution of the first “min”instruction “min r10, r1, r2, 1” (instruction 202), the value stored inthe four low-order bits of register r1 (register 222) is compared (step102, FIG. 1) to the four low-order bits of register r2 (register 224).Since the binary value “0101” of register r1 is less than the binaryvalue “0111” of register r2, in one embodiment, the entire contents(i.e., “0000 0101”) of register r1 are stored in register r10 (register226) at step 106 (FIG. 1). Accordingly, the register r10 includes theminimum value of the registers r1 and r2 (“0101”) as well as the indexof the register having the minimum value, r1 (“0000” or 0).

During the execution of the next “min” instruction “min r10, r10, r3, 2”(instruction 204), the value stored in the four low-order bits ofregister r3 (register 228) is compared with the minimum value stored inthe four low-order bits of register r10 at step 102. In this example,however, the value stored in register r3 is less than the value storedin register r10. Accordingly, the value of the four low-order bits ofthe register r3 (“0100” in binary) is stored in the four low-order bitsof the register r10 and the index (2 or “0010” in binary) is stored inthe four high-order bits of register r10 (steps 108 and 110, FIG. 1).Accordingly, the register r10 at this point includes the minimum value(“0100”) of the registers r1-r3 as well as the index (“0010” or 2)associated with the register r3 having the minimum value.

For the next “min” instruction “min r10, r10, r4, 3” (instruction 206),the value stored in the four low-order bits of register r4 (register230) is compared to the value stored in four low-order bits of theregister r10. In this example, the value stored in register r10 is lessthan the value stored in register r4. Accordingly, the value of theregister r10 remains unchanged, either by skipping the modification ofregister r10 or by recopying the contents of register r10. For thefollowing “min” instruction “min r10, r10, r5, 4” (instruction 208), thevalue stored in the four low-order bits of register r5 (register 232)are less than the value stored in register r10. Accordingly, the fourlow-order bits of register r5 are copied to the four-low-order bits ofregister r10 and the index supplied as part of the “min” instruction (4or “0100”) is copied to the four high-order bits of the register r10.Accordingly, the register r10 includes the minimum value of theregisters r1-r5 as well as the index of the register having the minimumvalue (i.e., register r5 having an index of 4).

During the execution of the “min” instruction “min r10, r10, r6, 5”(instruction 210), the value stored in the four low-order bits ofregister r6 (register 234) is determined to be greater than the valuestored in register r10. Accordingly, the processor retains the contentsof the register r10, where r10 stores both the index and the minimumvalue of registers r1-r6. During the execution of the next “min”instruction “min r10, r10, r7, 6” (instruction 212), the value stored inthe four low-order bits of register r7 (register 236) is determined tobe less than the value of register r10. Accordingly, the processor ismanipulated to store the value of the four low-order bits of register r7to the low-order bits of the register r10 as well as store the suppliedindex (6 or “0110”) into the four high-order bits of the register r10.As a result, the register r10 includes both the minimum value and theindex of the minimum of registers r1-r7. Likewise, during the executionof the next “min” instruction “min r10, r10, r8, 7” (instruction 214)the value stored in the four low-order bits of register r8 (register238) is determined to be less than the value stored in the register r10.Accordingly, the value of the four low-order bits of register r8 isstored in the low-order bits of register r10 and the index value (“0111”or 7) supplied as part of instruction 214 is stored in the high-orderbits of register r10.

As illustrated by the preceding sequence of “min” processor instructions202-214, the “min” instruction, when executed in such a manner, storesin the destination register (r10) both the minimum value thus far andit's index. Accordingly, after the sequential execution of theinstructions 202-214, the minimum value (“0000”) of the registers r1-r8is stored in the low-order bits of the register r10 while the indexcorresponding to the register having the minimum value (register r8having the index 7) is stored in the four high-order bits of theregister r10.

The process implementing the instructions 202-214 then can access theregister r10 to obtain the index and minimum value for various purposes.For example, the shift-right-immediate instruction “sri r11, r10, 4”(instruction 216) can be used to store the index value in the registerr11 (register 240) by shifting the value of register r10 by four bits tothe right.

In addition to the convenience of having the minimum value and its indexstored in a single register, the “min” instruction also can providebenefit in the form of efficiency. As discussed previously, a networkprocessor in accordance with the present invention can be adapted toexecute the min processor instruction in a single cycle. For example, tofind the minimum and index of eight registers, only seven cycles areneeded (corresponding to seven “min” instructions). By comparison, knownprocesses for determining a minimum typically utilize a branchinstruction and a jump instruction for each comparison with eachinstruction typically requiring at least one processor cycle, resultingin a minimum of fourteen cycles to determine the minimum of eightregisters.

Referring now to FIG. 3, another exemplary processor instruction fordetermining a minimum value of a plurality of values is illustrated inaccordance with at least one embodiment of the present invention. Itwill be appreciated that in some instances, the sum of the number N ofbits of a source register used to store a value and the number of bitsused to represent an index value may be greater than the total number ofbits of the destination register used to store the minimum value. Forexample, the values to be compared could require twelve bits while thenumber of register values to be compared could require an index havingat least five bits, thereby prohibiting the storage of both the minimumvalue and the index concatenated in the same 16-bit destinationregister. Accordingly, in one embodiment, the present invention providesfor a processor instruction, herein referred to as the “Min”instruction”, adapted to determine the minimum value of two sourceregisters, store the minimum in a destination register, and store theindex of the register having the minimum value in another destinationregister, herein referred to as the “MINDEX” register. As with the “min”instruction discussed above, the “Min” instruction can be represented inAssembly language format as “Min rD, rA, rB, index”, where rA, rB arethe source registers being compared, rD is the destination register, andindex is an immediate field having a constant-value index. The “Min”instruction, in one embodiment, can be executed by a suitable processorin one processor cycle. The execution of the “Min” instruction by aprocessor is illustrated by the instruction flow diagram 300 of FIG. 3and as described below.

At step 302, a processor using, for example, a comparator compares thevalue stored in register rA to the value stored in register rB. If thevalue of rA is less than or equal to the value of rB (step 304), thenthe value of rA is stored in the destination register rD at step 306. Inthe event that the register rA and rD are the same register, theregister rA/rD can remain unmodified or the contents of register rA/rDcan be recopied.

In the event that the value stored in the register rB is less than thevalue stored in register rA, then processor is manipulated to store thevalue of register rB in the register rD at step 308. At step 310, theindex associated with the register rB is moved to the MINDEX register.The MINDEX register preferably includes a special-purpose registerwhereby the modification of the MINDEX register using general-purposeregister instructions is limited or prohibited.

Referring now to FIG. 4, an exemplary implementation of a sequence of“Min” instructions to determine a minimum value of a plurality ofregisters and the corresponding index is illustrated. In the followingsequence of “Min” processor instructions 402-414, the minimum valuestored in 8-bit source registers r1-r8 is determined and stored in thedestination register r10 and the index of the source register having theminimum value is stored in the MINDEX register.

Min r10, r1, r2 1

Min r10, r10, r3, 2

Min r10, r10, r4, 3

Min r10, r10, r5, 4

Min r10, r10, r6, 5

Min r10, r10, r7, 6

Min r10, r10, r8, 7

For the execution of the first “Min” instruction “Min r10, r1, r2, 1”(instruction 402), the value stored in register r1 (register 222) iscompared (step 302, FIG. 3) to the value stored in register r2 (register224). Since in this example the value of register r1 (“0000 0101”) isless than the value of register r2 (“0000 0111”), the value stored inregister r1 is stored in the register r10 (step 306, FIG. 3) and sincethe value of the first source register (register r1) is the minimum ofthe values of the two source registers, the value of the MINDEX register418, initiated to value 0, remains unmodified.

During the execution of the next “Min” instruction “Min r10, r10, r3, 2”(instruction 404), the value stored in the register r3 (register 228) iscompared to the value stored in the register r10 (register 226). Sincein this example the value of register r3 is less than the value storedin the register r10, the value of the register r10 is replaced with thevalue stored in register r3 (step 308, FIG. 3) and the index suppliedwith instruction 404 (index of 2 or “0010”) is stored in the MINDEXregister (step 310, FIG. 3). Accordingly, the register r10 contains theminimum value of the registers r1-r3 and the MINDEX register containsthe index of the register having the minimum value (i.e., the index of 2corresponding to the register r3). The process of comparing the valuesof the first source register (register r10 in this example) with theremaining source registers r4-r8 (registers 230-238) and storing theminimum value for each comparison in the register r10 and the index ofthe register having the minimum value in the MINDEX register is repeatedfor instructions 406-414.

After the execution of instruction 414, the register r10 contains theminimum value of the registers r1-r8 (i.e., the value “0000 0000” ofregister r8) and the MINDEX register contains the index of the registerhaving the minimum value (i.e., the index of 7 or “0111” correspondingto register r8). Since the MINDEX register preferably includes a specialpurpose register, the value stored in the MINDEX register can be movedto a general purpose register for subsequent access using, for example,the special processor load instruction “sprl r11, MINDEX” (instruction416) whereby the value stored in the MINDEX is loaded to the generalpurpose register r11 (register 240).

As with the “min” instruction discussed with reference to FIGS. 1 and 2,network processors in accordance with the present invention can beadapted to execute the “Min” instruction of FIGS. 3 and 4 in a singleprocessor cycle. As a result, the seven “Min” instructions utilized todetermine the minimum value of eight registers could be executed in atotal of seven cycles compared to known processes requiring at leastfourteen processor cycles as a result of the use of both a compareinstruction and a jump instruction for each comparison.

Referring now to FIG. 5, yet another exemplary processor instruction fordetermining a minimum value of a plurality of values is illustrated inaccordance with at least one embodiment of the present invention. Insome instances, a source register may have the capacity to store two ormore values for comparison. For example, a 32-bit source register couldbe used to store two 16-bit values. Accordingly, in one embodiment, thepresent invention provides for a processor instruction, herein referredto as the “MIN” instruction”, adapted to manipulate a processor todetermine, in parallel, the minimum value of two values in a firstsource registers and the minimum value of two values in a second sourceregister and store the minimum value of the first source register andthe minimum value of the second source register in a destinationregister.

The “MIN” instruction is further adapted to manipulate the processor tostore the indexes of the minimum values in another destination register(i.e., the MINDEX register), which preferably includes a special-purposeregister. As with the “min” instruction and “Min” instruction discussedabove, the “MIN” instruction can be represented in Assembly languageformat as “MIN rD, rA, rB, index”, where rA, rB are the source registersbeing compared, rD is the destination register, and index is animmediate field having a constant-value index. The performance of the“MIN” instruction by a processor is illustrated by the followinginstruction flow diagram 500. The “MIN” instruction can be executed in asingle cycle using a suitable processor, such as the exemplary networkprocessor illustrated in FIGS. 7 and 8.

In at least one embodiment, each of the source registers rA and rB havea first portion for storing a first value and a second portion forstoring a second value. The values stored in these portions are hereinreferred to as rA₁ and rA₂ for register rA and rB₁ and rB₂ for registerrB. These register portions preferably are of equal size. Likewise, thedestination register rD and the MINDEX register, in one embodiment, eachare separated into two portions, where the two portions of thedestination register rD are used to store two minimum values and the twoportions of the MINDEX register are used to store the indexes of the twominimum values.

In step 502, the value (rA₁) stored in the first portion of the rAregister is compared to the value (rA₂) stored in the second portion ofthe rA register. If the value rA₁ is less than or equal to the value rA₂(step 504), the value rA₁ is stored in the first portion of thedestination register rD (rD₁) at step 506. At step 508, the suppliedvalue for the index is stored in the first portion of the MINDEXregister (herein MINDEX₁), indicating that the first portion of the rAregister stores the minimum of rA₁ and rA₂. Alternatively, if the valuerA₂ is less than the value rA₁, the value rA₂ is stored in rD₁ at step510 and a value equal to index+1 is stored in MINDEX₁ at step 512,indicating the second portion of the rA register stores the minimum ofrA₁ and rA₂.

In step 522, the value (rB₁) stored in the first portion of the rBregister is compared to the value (rB₂) stored in the second portion ofthe rB register. If the value rB₁ is less than or equal to the value rB₂(step 524), the value rB₁ is stored in the second portion of thedestination register rD (rD₂) at step 526. At step 528, the previousvalue stored in the first portion of the MINDEX register (hereinMINDEX₁, PREV), thereby indicating that the first portion of rB storesthe minimum of rB₁ and rB₂. Alternatively, if the value rB₂ is less thanthe value rB₁, the value rB₂ is stored in rD₂ at step 530 and the valuepreviously stored in the second portion of the MINDEX register (hereinMINDEX₂, PREV) is stored in MINDEX₂ at step 532, indicating the secondportion of the register rB as storing the minimum of rB₁ and rB₂.

In at least one embodiment, the present invention provides for aprocessor being adapted to perform one or more of steps 502-512 inparallel with the corresponding step of steps 522-532. As described ingreater detail with reference to FIGS. 7 and 8, the processor caninclude, for example, dual comparators and multiplexers for parallelcomparisons of the values of two registers (steps 502, 522).Furthermore, the processor can be adapted to complete steps 502-532 in asingle cycle, resulting in a significant reduction in processing time tofind a minimum of a plurality of values.

Referring now to FIG. 6, an exemplary implementation of the “MIN”instruction to determine the minimum of a plurality of values stored inmemory is illustrated. In the following example, eight values v0-v7 arestored sequentially in sixteen-bit memory blocks 642-656 of memory 640,wherein the memory blocks 642-656 are sequentially addressed. Theregister r8 (register 238) stores a value that references the firstmemory block 642 having the first value v0.

A processor (not shown) loads the first two values v0 and v1 from memory640 to register r2 (register 222) and the next two values v2 and v3 intoregister r3 (register 224) using, for example, a load double instruction“Ld.d r1, r2, 0(r8)” (instruction 602). The load double instruction hasas inputs two registers and an address field. The load doubleinstruction is adapted to manipulate the processor to load sixty-fourbits of data (i.e., four words) from memory 640 starting at the addressin memory 640 indicated by the input address field (i.e., 0(r8) or 0bytes offset of the address value stored in register r8). The first twowords (i.e., values v0, v1) are loaded as portions 622 and 624,respectively, of register r1 and the second two words (i.e., values v1and v2) are loaded into portions 626 and 628, respectively, of registerr2. Further, prior to the execution of the sequence of processorinstructions 602-614, the index of the memory block 646 having the valuev2 (i.e., an index of 2 corresponding to memory block 646) is loadedinto the first portion 630 of the MINDEX register 418 and the index ofthe memory block 648 having the value v3 (i.e., an index of 3corresponding to memory block 648) is loaded into the second portion 632of the MINDEX register.

After loading values v0-v3 into registers r1 and r2 and initializing theMINDEX register, the instruction “MIN r3, r1, r2, 0” (instruction 604)is executed by the processor. During execution, the value v0 stored inthe first portion 622 of the register r1 is compared with the value v1stored in the second portion 624 and the minimum value of the values v0,v1 (herein denoted as min(v0, v1)) is stored in the first portion 634 ofthe register r3 (register 228). Likewise, depending on whether value v0or v1 is the minimum value, either a value of 0 (the supplied index) or1 (index+1) is stored in the first portion 630 of the MINDEX register418.

At the same time that the values v0 and v1 in register r1 are beingcompared, the value v2 in the first portion 626 of the register r2 iscompared with the value v3 stored in the second portion 628 and theminimum of the values v2, v3 (herein denoted as min(v2, v3)) is storedin the second portion 636 of the register r3. Depending on whether thevalue v2 or v3 is the minimum value, either the value stored in thefirst portion 630 or the value previously stored in the second portion632 of the MINDEX portion is then stored in the second portion 632 ofthe MINDEX register 418. For example, if the value v2 is the min(v2,v3), then the index of the memory block having value v2 (memory block646) previously stored in the first portion 630 of the MINDEX registeris moved to the second portion 632 of the MINDEX register.Alternatively, if the value v3 is the min(v2, v3), then the secondportion 632 remains unmodified since it already contains the index ofthe memory block having the min(v2, v3) (memory block 648). Accordingly,after the execution of the instruction 604, the min(v0, v1) is stored inthe first portion 634 of the register r3 and the index of the memoryblock in memory 640 having the min(v0, v1) value is stored in the firstportion 630 of the MINDEX register. Likewise, the min(v2, v3) is storedin the second portion 636 of the register r3 and the index of the memoryblock having min(v2, v3) is stored in the second portion 632 of theMINDEX register.

During the execution of the next instruction, “Ld.d r1, r2, 8(r8)”(instruction 606), the next sixty-four bits are loaded from memory 640into the registers r1, r2 starting at the memory address having an eightbyte offset from the memory address stored in register r8, i.e., thememory block 650 having value v4. The values v4 and v5 are loaded intoregister r1 and values v6 and v7 are loaded into register r2.Accordingly, during the execution of the next “MIN” instruction “MIN r3,r1, r3, 4” (instruction 608), the minimum of the two values stored inregister r3 (i.e., the minimum of min(v0, v1) and min(v2, v3), hereinreferred to as min(v0-v3)), is determined and subsequently stored in thesecond portion 636 of the register r3. At the same time, the minimum ofvalues v4 and v5 (i.e., min(v4, v5)) stored in register r1 is determinedand stored in the first portion 634 of register r3. Further, the indexof the memory block having the min(v4, v5) is stored in the firstportion 630 of the MINDEX register by storing the supplied index valueof 4 in the first portion if the value v4 is the minimum or by storing avalue of index+1, or 5, in the first portion when the value v5 is theminimum. The index of the memory block having the min(v0-v3) is storedin the second portion 632 of the MINDEX register either by storing theprevious value stored in the first portion 630 in the second portion 632if the min(v0, v1) is the min(v0-v3) or by retaining the previous valuestored in the second portion 632 if the min(v2, v3) is the min(v0-v3).

Since the next values to be compared, v6 and v7, are already availablein register r2 as a result of the previous load double instruction(instruction 606), another “MIN” instruction, “MIN r, r2, r3, 6”(instruction 610) can be executed to determine the minimum of values v6,v7 (i.e., min(v6, v7)) as well as the minimum of min(v4, v5) andmin(v0-v3), herein referred to as the min(v0-v5). Accordingly, themin(v4, v5) in the first portion 634 of the register r3 is compared withthe min(v0-v3) in the second portion 636 to determine the min(v0-v5).The min(v0-v5) is subsequently stored in the first portion 634 of theregister r3.

Simultaneously, the value v6 in the first portion 626 of the register r2is compared with the value v7 in the second portion 628 to determine themin(v6, v7), which is then stored in the second portion 636 of theregister r3. As with the previous “MIN” instruction, the index of thememory block in memory 640 having the min(v6, v7) is stored in the firstportion 630 of the MINDEX register by storing the supplied index valueof 6 in the first portion if the value v6 is the minimum or by storing avalue of index+1, or 7, in the first portion when the value v5 is theminimum. The value previously stored in the first portion 630 is movedto the second portion 632 of the MINDEX register if the min(v4, v5) isthe min(v0-v5) or the value stored in the second portion 632 remainsunchanged if the min(v0-v3) is the min(v0-v5).

In the same manner, the “MIN” instruction “MIN r3, r0, r3, 0”(instruction 612) is executed resulting in the minimum of values v0-v7(i.e., min(v0-v7)) being stored in the second portion 636 of register r3and the index of the memory block having the min(v0-v7) being stored inthe second portion 632 of the MINDEX register. However, rather than usea register having values from memory as a first source register, aregister r0 (not shown) storing a constant value of zero is used.Accordingly, after the execution of the instruction 612, the bits offirst portion 634 of the register r3 are populated with zeros (i.e.,min(0,0)), as are the bits of the first portion 630 of the MINDEXregister. Therefore, the entire value stored in register r3 is the32-bit version of the 16-bit value stored in the second portion 636 andthe entire value stored in the MINDEX register is the 32-bit version ofthe 16-bit value stored in the second portion 632. Accordingly, theminimum of values v0-v7, min(v0-v7), can be obtained directly from theregister r3 after execution of the instruction 612. Likewise, the indexof the memory block storing the min(v0-v7) can be obtained using, forexample, the special processor register load instruction “sprl r4,MINDEX” (instruction 614) whereby the index value stored in the MINDEXregister is loaded into the general purpose register r4 (register 230)for subsequent access.

As FIG. 6 illustrates, a sequence of “MIN” instructions can beimplemented to efficiently determine the minimum value of a plurality ofvalues as well as an index associated with the minimum value. Knowninstruction sequences typically would require a minimum of fourteeninstructions to determine the minimum of eight values as each comparisontypically is implemented using both a compare and jump instruction. Incomparison, only four “MIN” instructions and a load instruction areneeded to obtain both the minimum of eight values as well as its indexin memory. Accordingly, if each compare instruction, jump instructionrequires one cycle and each “MIN” instruction and sprl load instructionrequires one cycle, the instruction sequence of FIG. 6 requires at leastnine fewer cycles to complete, assuming the processing time required toload the values from memory is equivalent in either case.

In many implementations, the minimum of a plurality of values isperiodically determined as the values stored in memory and/or the sourceregisters change. In such cases, the values may be incremented such thatan overflow of the maximum represented integer occurs, resulting in theincremented value wrapping around to zero. Accordingly, in at least oneembodiment, the two most significant bits of each value being comparedduring the execution of the “min” instruction (step 102, FIG. 1), the“Min” instruction (step 302, FIG. 3), and/or the “MIN” instruction(steps 502, 522, FIG. 5) are examined. If both values do not have avalue of “00” for the two most significant bits, the values are comparedas described above. However, if one of the values has a value of “00”for its two most significant bits and the other value has a value of“11” for its two most significant bits, then the latter value isconsidered the minimum of the two values.

The special consideration of the two most significant bits can enable asystem to increment values without the need for checking for overflowand/or without requiring the values to be updated/modified at eachoccurrence of overflow, assuming that the maximum increment of a valueis no more than one-fourth of the maximum represented integer of theregister/memory block. Although a process for examining the twomost-significant bits in implementations having value increments limitedto one-fourth the maximum represented integer is discussed above, thoseskilled in the art can implement processes for examining fewer or moremost significant bits when the maximum value increments are more than orless than one-fourth of the maximum represented integer, using theguidelines provided herein.

Referring now to FIGS. 7 and 8, an exemplary implementation of a networkprocessor adapted to execute the “min”, “Min”, and/or “MIN” processorinstruction is illustrated in accordance with at least one embodiment ofthe present invention. As discussed above, many network systems utilizeWeighted Fair Queuing (WFQ) to “fairly” multiplex multiple input streamsinto a single output stream. To illustrate, a CPE 740 may be implementedin the network 700 as a gateway between, for example, a wide areanetwork (WAN) 760 and a plurality of network devices 702-716 on a localarea network (LAN). The CPE 740 may include any of a variety of CPEdevices, such as a bridge, router, hub, switch, telephone modem, cablemodem, digital subscriber line (DSL) modem, fiber-optic modem, gateway,and the like. Each of network devices 702-716 (e.g., a desktop computer)provides a stream of data packets for the CPE 740 to process and thenoutput over a single connection to the WAN 760. The CPE 740 thereforeincludes a plurality of input queues 722-736 in memory 750, each inputqueue corresponding to one of the network devices 702-716. Each inputqueue buffers the incoming data from the corresponding network deviceuntil the network processor 742 selects the queue for output of a datapacket from the selected queue to the WAN 760 via an output interface748 (e.g., a Utopia interface).

In at least one embodiment, the network processor 742 is adapted toimplement a WFQ process in order to ensure that each input queuereceives its share of the output bandwidth and to provide a latencybound for each queue. Accordingly, each input queue could be given apriority via a “due” timestamp associated with each input queue, wherebythe next packet in the input queue having the minimum “due” timestamp isselected for output during a queuing cycle. After selection, the “due”timestamp of the selected queue is incremented by a predeterminedamount. In one manner, the network processor 742 performs the WFQprocess by determining the minimum “due” timestamp as well asidentifying the queue having this minimum “due” timestamp.

The heap data structure is commonly used to implement prioritization ofinput queues. Once a heap is generated from a plurality of values, suchas the “due” timestamps of the input queues, the minimum value is at theroot node of the heap. Accordingly, the minimum of the plurality ofvalues can be determined easily by accessing the root node of the heap.However, since the value of the “due” time stamp of a queue isincremented during or after selection, the old value of the “due” timestamp should be removed from the heap, the heap rebuilt, and the newvalue of the “due” time stamps inserted. As will be appreciated by thoseskilled in the art, the insertion of a new value into a heap or therebuilding of a heap after the removal of a value typically includesfinding the minimum value of the child nodes of a parent node. Thenetwork processor 742 can be adapted to implement the one or more of thevarious processor instructions for determining a minimum value describedherein to efficiently determine the minimum value during the maintenanceof a heap data structure. The network processor 742 includes, in oneembodiment, an arithmetic logical unit (ALU) 744 and register file 746having a plurality of general purpose registers and/or special purposeregisters in order to implement the “min”, “Min”, and “MIN” processorinstructions described above.

FIGS. 8 and 9 illustrate an exemplary implementation of the ALU 744 ingreater detail in accordance with at least one embodiment of the presentinvention. The network processor 742, in at least one embodiment, isadapted to perform the “MIN rD, rA, rB, index” instruction of FIG. 3 inone processor cycle using the ALU 744 and register file 746. Asillustrated in FIG. 8, the ALU 744 includes two multiplexers (MUX) 810,820 adapted to operate in parallel. The MUX 810 has as inputs the valuerA₁ stored in the first portion 832 and the value rA₂ stored in thesecond portion 834 of the first source register rA (register 802) andthe value rD₁ output from the MUX 810 is stored in the first portion 842of the destination register rD (register 806). The MUX 820 has as inputsthe value rB₁ stored in the first portion 836 and the value rB₂ storedin the second portion 838 of the second source register rB (register804) and the value rD₂ output from the MUX 820 is stored in the secondportion 844 of the destination register rD. Further, the ALU 744includes a comparator 812, an active module 814, and an overflow module816 for providing control signals to the MUX 810 and a correspondingcomparator 822, active module 824, and overflow module 826 for providingcontrol signals to the MUX 820.

In at least one embodiment, the comparator 812 is adapted to compare thevalues rA₁, rA₂ in the register 802 to determine the minimum of thevalues rA₁, rA₂. The comparator 812 then provides a signal to the MUX810 directing the MUX 810 to output the minimum of values rA₁, rA₂ forstorage in the first portion 842 of the destination register 806. Asdiscussed above, there may be potential for overflow of the valuesstored in the first portions 832, 834 due to an increment that increasesthe stored values above the maximum represented integer, resulting inthe wraparound of the values to zero.

Accordingly, the overflow module 816, in parallel with the comparisonperformed by the comparator 812, examines the two most significant bitsof the values rA₁, rA₂, to determine if any of the values have wrappedaround zero. If one of the values rA₁, rA₂, have the value “00” at itstwo most significant bits and the other value has “01” at its two mostsignificant bits, the value having “01” at the two most significant bitsis determined to be the minimum of the values rA₁, rA₂ and the overflowmodule 816 provides a signal to the MUX 810 indicating the minimum. Thissignal can be used by the MUX 810 to output the minimum of values rA₁,rA₂ into the first portion 842 of the register 806. It will beappreciated that since the comparator 812 compares the values rA₁, rA₂without regard to overflow, the signal indicating the minimum providedby the comparator 812 may conflict with the signal indicating theminimum provided by the overflow module 816. In this case, the signalfrom the overflow module 816 overrides the signal from the comparator812.

Furthermore, in one embodiment, the least significant bit for each ofvalues rA₁, rA₂ is reserved to indicate an active status associated withthe value. To illustrate, recall that the values rA₁, rA₂ can representthe “due” timestamps associated with two input queues of the CPE 740(FIG. 7). Since one or more of the input queues may be deactivated basedon, for example, the loss of a connection to the corresponding networkdevice, it may be desirable to prevent the network processor 742 fromattempting to transmit data from the deactivated input queue. Therefore,the least significant bit may be utilized as an active status bit,wherein a value of “0” indicates the input queue is active and a valueof “1” indicates the input queue is inactive. To prevent the unintendedactivation/deactivation of the input queues represented by values rA₁,rA₂, the network processor 742 can be adapted to increment/decrement thevalues rA₁, rA₂, by multiples of two.

In parallel with the comparator 812 and/or the overflow module 816, theactive module 814 examines the least significant bit of the values rA₁,rA₂ to determine if one or more of the active bits are set to “inactive”(i.e., a value of “1”). If one of the values rA₁, rA₂ is inactive, theactive module 814 selects the active value as the minimum and provides asignal to the MUX 810 indicating the active value as the minimum value.If both values are inactive, the active module 814 selects the firstvalue rA₁ and provides a signal to the MUX 810 indicating the value rA₁as the minimum regardless of the actual relation between the values rA₁,rA₂. Since the active module 814 is adapted to examine the active statusbit to determine the statuses associated with the values rA₁, rA₂, thesignal indicating the minimum provided by the active module 814 mayconflict with the signals provided by the overflow module 816 and/or thecomparator 812. Accordingly, in one embodiment, the signal from theactive module 814 overrides the signals from both the comparator 812 andthe overflow module 816.

Based on the signals provided from the comparator 812, the active module814, and/or the overflow module 816 operating in parallel, the MUX 810selects the “minimum” of the values rA₁, rA₂ for output to the firstportion 842 of the destination register 806. In the same manner, thecomparator 822, active module 824, and overflow 826 examine the valuesrB₁, rB₂ to determine the “minimum” for output by the MUX 820 to thesecond portion 844 of the destination register 806.

FIG. 9 illustrates an exemplary implementation of a portion of the ALU744 for storing the indexes of the corresponding values in thedestination register rD. As illustrated, the ALU 744 additionallyincludes two multiplexers 910, 920 for selecting the appropriate indexvalue for storage in the corresponding portion of the MINDEX register418. The MUX 910 has as inputs the value of the supplied index and thevalue of index+1. Based on input provided from the comparator 812, theactive module 814, and/or the overflow module 816 representing theminimum value determined from the values rA₁ and rA₂ of the register rA,the index select module 912 is adapted to provide a signal to the MUX910 indicating which of index, index+1 is to be selected for output andstorage in the first portion 630 of the MINDEX register 418. Toillustrate, if the input to the index select module 912 indicates thevalue rA₁ is the “minimum”, then the index select module 912 can beadapted to direct the MUX 910 to select the index input for output tothe first portion 630, whereas if the value rA₂ is the “minimum”, theindex select module 912 can direct the MUX 910 to select the index+1 foroutput.

In a similar-manner, the index select module 922 receives input from thecomparator 822, the active module 824, and/or the overflow module 826indicating the “minimum” of the values rB₁ and rB₂ of the register rB.However, rather than having index and index+1 as inputs, the MUX 920 hasas inputs the value stored in the first portion 630 of the MINDEXregister and the value stored in the second portion 632 after theexecution of the previous “MIN” command (i.e., MINDEX₁, PREV andMINDEX₂, PREV, respectively). Based on the indicated “minimum” of rB₁and rB₂, the select module 922 is adapted to direct the MUX 920 tooutput for storage in the second portion 632 either the MINDEX₁, PREVvalue (rB₁<=rB₂) or the MINDEX₂, PREV value (rB₂<rB₁).

In at least one embodiment, the processing block comprising thecomparator 812, the active module 814, the overflow module 816, theselect module 912, and the MUXs 810, 910 operates in parallel with thecorresponding component of the processing block comprising thecomparator 822, the active module 824, the overflow module 826, theselect module 922, and the MUXs 820, 920. As a result, the minimums oftwo sets of two values, as well as their indexes, can be determinedsimultaneously, preferably in one processor cycle.

Other embodiments, uses, and advantages of the invention will beapparent to those skilled in the art from consideration of thespecification and practice of the invention disclosed herein. Thespecification and drawings should be considered exemplary only, and thescope of the invention is accordingly intended to be limited only by thefollowing claims and equivalents thereof.

1. A processor for determining a minimum value of a plurality of valuesstored in source registers and determining an index value of sourceregister having the minimum value, the processor comprising: means fordetermining a first minimum value of a first value and a second value;means for determining a second minimum value of a third value and afourth value; means for storing the first minimum value in a firstportion of a first destination register and the second minimum value ina second portion of the first destination register; and means forstoring a first index value associated with the first minimum value in afirst portion of a second destination register and a second index valueassociated with the second minimum value in a second portion of thesecond destination register; wherein the means for determining the firstminimum value and the means for determining the second minimum value areadapted to execute in parallel.
 2. The processor of claim 1, wherein themeans for determining and the means for storing are adapted to executesequentially within one processor cycle.
 3. The processor of claim 1,wherein each of the first, second, third, and fourth values includes anactive status bit to indicate a status, and wherein a value having anactive status is less than a value having an inactive status.
 4. Theprocessor of claim 1, wherein a value having “11” as its two mostsignificant bits is less than a value having “00” as its two mostsignificant bits.
 5. The processor of claim 1, wherein the first sourceregister and the destination register comprise a same register.
 6. Theprocessor of claim 1, wherein the second source register and thedestination register comprise a same register.
 7. A processor fordetermining a maximum value of a plurality of values stored in sourceregisters and determining an index value of source register having themaximum value, the processor comprising: means for determining a firstmaximum value of a first value and a second value; means for determininga second maximum value of a third value and a fourth value; means forstoring the first maximum value in a first portion of a firstdestination register and the second maximum value in a second portion ofthe first destination register; and means for storing a first indexvalue associated with the first maximum value in a first portion of asecond destination register and a second index value associated with thesecond maximum value in a second portion of the second destinationregister; wherein the means for determining the first maximum value andthe means for determining the second maximum value are adapted toexecute in parallel.
 8. The processor of claim 7, wherein the means fordetermining and the means for storing are adapted to executesequentially within one processor cycle.
 9. The processor of claim 7,wherein each of the first, second, third, and fourth values includes anactive status bit to indicate a status, and wherein a value having anactive status is greater than a value having an inactive status.
 10. Theprocessor of claim 7, wherein a value having “11” as its two mostsignificant bits is greater than a value having “00” as its two mostsignificant bits.
 11. The processor of claim 7, wherein the first sourceregister and the destination register comprise a same register.
 12. Theprocessor of claim 7, wherein the second source register and thedestination register comprise a same register.